MERIT: a Mutation Error Rate Identification Toolkit for Ultra-deep Sequencing Applications
نویسندگان
چکیده
Rapid progress in high-throughput sequencing (HTS) has enabled the molecular characterization of mutational landscapes in heterogeneous populations and has improved our understanding of clonal evolution processes. Analyzing the sensitivity of detecting genomic mutations in HTS requires comprehensive profiling of sequencing artifacts. To this end, we introduce MERIT, designed for in-depth quantification of erroneous substitutions and small insertions and deletions, specifically for ultra-deep applications. MERIT incorporates an allinclusive variant caller and considers genomic context, including the nucleotides immediately at 5′ and 3′, thereby establishing error rates for 96 possible substitutions as well as four singlebase and 16 double-base indels. We apply MERIT to ultra-deep sequencing data (1,300,000×) and show a significant relationship between error rates and genomic contexts. We devise an in silico approach to determine the optimal sequencing depth, where errors occur at rates similar to those of true mutations. Finally, we assess nucleotide-incorporation fidelity of four high-fidelity DNA polymerases in clinically relevant loci, and demonstrate how fixed detection thresholds may result in substantial false positive as well as false negative calls. Introduction Advances in high-throughput sequencing (HTS) technologies have revolutionized the genomic, transcriptomic, and epigenomic characterization of biological states. HTS platforms produce large Correspondance: [email protected] 1 peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/184291 doi: bioRxiv preprint first posted online Sep. 4, 2017;
منابع مشابه
Quantitative Identification of Mutant Alleles Derived from Lung Cancer in Plasma Cell-Free DNA via Anomaly Detection Using Deep Sequencing Data
The detection of rare mutants using next generation sequencing has considerable potential for diagnostic applications. Detecting circulating tumor DNA is the foremost application of this approach. The major obstacle to its use is the high read error rate of next-generation sequencers. Rather than increasing the accuracy of final sequences, we detected rare mutations using a semiconductor sequen...
متن کاملDetection of ultra-rare mutations by next-generation sequencing.
Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ~1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely proble...
متن کاملUltra-deep sequencing for the analysis of viral populations.
Next-generation sequencing allows for cost-effective probing of virus populations at an unprecedented level of detail. The massively parallel sequencing approach can detect low-frequency mutations and it provides a snapshot of the entire virus population. However, analyzing ultra-deep sequencing data obtained from diverse virus populations is challenging because of PCR and sequencing errors and...
متن کاملDe novo meta-assembly of ultra-deep sequencing data
UNLABELLED We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized 'slices' and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Sl...
متن کاملCharacterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance.
The detection of mutant spectra within a population of microorganisms is critical for the management of drug-resistant infections. We performed ultra-deep pyrosequencing to detect minor sequence variants in HIV-1 protease and reverse transcriptase (RT) genes from clinical plasma samples. We estimated empirical error rates from four HIV-1 plasmid clones and used them to develop a statistical app...
متن کامل